Delving into the Local: Dynamic Inconsistency Learning for DeepFake Video Detection

نویسندگان

چکیده

The rapid development of facial manipulation techniques has aroused public concerns in recent years. Existing deepfake video detection approaches attempt to capture the discrim- inative features between real and fake faces based on tem- poral modelling. However, these works impose supervisions sparsely sampled frames but overlook local mo- tions among adjacent frames, which instead encode rich in- consistency information that can serve as an efficient indica- tor for DeepFake detection. To mitigate this issue, we delves into motion propose a novel sampling unit named snippet contains few successive videos temporal inconsistency learning. Moreover, elaborately design Intra-Snippet Inconsistency Module (Intra-SIM) Inter-Snippet Interaction (Inter- SIM) establish dynamic modelling frame- work. Specifically, Intra-SIM applies bi-directional difference operations learnable convolution ker- nel mine short-term motions within each snippet. Inter-SIM is then devised promote cross-snippet infor- mation interaction form global representations. Intra- SIM work alternate manner be plugged existing 2D CNNs. Our method outperforms state art competitors four popular benchmark dataset, i.e., FaceForensics++, Celeb-DF, DFDC Wild- Deepfake. Besides, extensive experiments visualizations are also presented further illustrate its effectiveness.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Delving Deeper into Convolutional Networks for Learning Video Representations

We propose an approach to learn spatio-temporal features in videos from intermediate visual representations we call “percepts” using Gated-Recurrent-Unit Recurrent Networks (GRUs). Our method relies on percepts that are extracted from all levels of a deep convolutional network trained on the large ImageNet dataset. While high-level percepts contain highly discriminative information, they tend t...

متن کامل

Delving Deeper into Convolution Networks for Learning Video Representation

Video analysis and understanding represents a major challenge for computer vision and machine learning research. While previous work has traditionally relied on hand-crafted and task-specific representations, there is a growing interest in designing general video representations that could help solve tasks in video understanding such as human action recognition, video retrieval or video caption...

متن کامل

Cascade R-CNN: Delving into High Quality Object Detection

In object detection, an intersection over union (IoU) threshold is required to define positives and negatives. An object detector, trained with low IoU threshold, e.g. 0.5, usually produces noisy detections. However, detection performance tends to degrade with increasing the IoU thresholds. Two main factors are responsible for this: 1) overfitting during training, due to exponentially vanishing...

متن کامل

the effects of integrating cooperative learning into vocabulary learning of elementary school students

the purpose of the research is to examine if integrating cooperative learning into vocabulary learning helps to increase word recognition of students in an elementary school in iran. it tries to investigate whether cooperative learning approach enables students to improve their language learning. this research used stad (students team achievement division) as a cooperative model in this study. ...

15 صفحه اول

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2022

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v36i1.19955